Metabolites profiling, in-vitro and molecular docking studies of five legume seeds for Alzheimer’s disease

Even though legumes are valuable medicinal plants with edible seeds that are extensively consumed worldwide, there is little information available on the metabolic variations between different dietary beans and their influence as potential anti-cholinesterase agents. High-resolution liquid chromatography coupled with mass spectrometry in positive and negative ionization modes combined with multivariate analysis were used to explore differences in the metabolic profiles of five commonly edible seeds, fava bean, black-eyed pea, kidney bean, red lentil, and chickpea. A total of 139 metabolites from various classes were identified including saponins, alkaloids, phenolic acids, iridoids, and terpenes. Chickpea showed the highest antioxidant and anti-cholinesterase effects, followed by kidney beans. Supervised and unsupervised chemometric analysis determined that species could be distinguished by their different discriminatory metabolites. The major metabolic pathways in legumes were also studied. Glycerophospholipid metabolism was the most significantly enriched KEGG pathway. Pearson’s correlation analysis pinpointed 18 metabolites that were positively correlated with the anti-cholinesterase activity. Molecular docking of the biomarkers to the active sites of acetyl- and butyryl-cholinesterase enzymes revealed promising binding scores, validating the correlation results. The present study will add to the metabolomic analysis of legumes and their nutritional value and advocate their inclusion in anti-Alzheimer’s formulations.

Legumes are an essential part of the diet in several countries worldwide, particularly as a source of dietary protein in developing countries 5 .Recently, interest in legumes has increased because of their beneficial or protective actions on human health.Many studies have reported that frequent consumption of legumes reduces the risk of type 2 diabetes, certain types of cancer, cardiovascular disease, overweight or obesity 6 .These activities could be attributed to the nutritional composition of pulses and their bioactive secondary compounds such as flavonoids, phenolic acids, tannins, and saponins, in addition to others 7,8 .The most consumed legumes (Fabaceae family) are chickpeas (Cicer arietinum L.), lentils (Lens culinaris Medik), and beans (Phaseolus vulgaris L.).
Due to the complicated metabolic nature of plant matrices, untargeted metabolomics is increasingly popular in food analysis and regarded as the preferred technique for metabolic profiling of plant metabolites 9,10 .It helps researchers better comprehend the complexity of these mixtures by providing an objective method for comparing metabolite profiles between groups, enabling us to detect new markers to fight against food fraud 11 .
Over the last few years, metabolomics approaches have been engaged in the field of food sciences as powerful tools to ensure food safety, quality, and traceability 12 .This has brought attention to the need to develop and apply techniques such as metabolomics that make it possible to stay abreast of the new requirements of the food market 12,13 .Metabolomics approaches will allow the industry to analyze food quality and food authentication 12 , and aiding in determining which metabolites are most relevant for certain biological activities 14 .To fully exploit, analyze, and extract valuable information from the vast amounts of chemical data produced, chemometric tools such as principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA), have become a must 9 .In a nutshell, non-targeted metabolic analysis using UHPLC-Q-TOF-MS in conjunction with chemometrics can be a powerful strategy for detecting multi-class components, finding reliable markers for distinguishing between closely related species, and understanding differences in their biological potential.
Compared to previous LC-MS/MS studies on legume seeds, several studies have used targeted metabolomics analysis 15 and directed to specific metabolite classes, such as flavonoids 16 , anthocyanins 17 , and saponins 18 or particular Leguminosae species, such as faba beans 19 , chickpeas 20 , and lentils 21 .However, there is relatively little information available on the full phytochemical composition of numerous legumes utilizing untargeted metabolomic methods 22 .
This study is the first to provide comprehensive metabolite fingerprints of five edible legume seeds (fava beans, black-eyed peas, kidney beans, chickpeas, and lentils) using an untargeted LC-MS-based metabolomics approach combined with chemometrics as a useful tool to assess metabolite heterogeneity and trace the bioactive compounds, contributing to the medicinal significance of these highly consumed seeds in human diets.Furthermore, the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis indicated the most significant biochemical metabolic pathways implicated by the identified metabolites, which contributed to the metabolic discrepancy among the samples.
Our study was aimed at searching for more nutraceuticals to prevent and diminish the symptoms of Alzheimer's disease, where the comprehensive profiles of five commonly used legumes in human diet (Vicia faba L. (fava beans, FB), Vigna unguiculata (L.) Walp.(black-eyed pea, BP), Phaseolus vulgaris L. (common bean, kidney bean, KB), Lens culinaris L. (lentils, red lentil, RL), and Cicer arietinum L. (chickpea, CP) were investigated using LC/MS-based metabolomics.The antioxidant and anticholinergic effects of the seeds were determined using multiple in-vitro assays.The differentially accumulated metabolites and the top enriched metabolic pathways in the studied seeds were annotated.The metabolite-bioactivity correlation was determined using Pearson's correlation.In addition, molecular docking simulations were used to examine the binding affinities and interactions of the identified compounds with AChE and BuChE active sites.The paper may represent the first comprehensive document of the multi-targeted potential of these legumes applying metabolomics to establish which compounds discriminate between the five studied legumes (Vicia faba L. (fava beans, FB), Vigna unguiculata (L.) Walp.(black-eyed pea, BP), Phaseolus vulgaris L. (common bean, kidney bean, KB), Lens culinaris L. (lentils, red lentil, RL), and Cicer arietinum L. (chickpea, CP), while assessing their health-promoting potential against Alzheimer's disease to optimize their use in the food, supplements, and pharmaceutical industries.

Plant material and extraction
Five seeds of Fabaceae family; Vicia faba L. (fava beans, FB), Vigna unguiculata (L.) Walp.(black-eyed pea, BP), Phaseolus vulgaris L. (common bean, kidney bean, KB), Lens culinaris L. (lentils, red lentil, RL), and Cicer arietinum L. (chickpea, CP) were obtained from Agricultural Research Center, Giza, Egypt, in February 2022.The plant collection complies with relevant institutional, national, and international guidelines and legislation.Samples were kept in the Herbarium of the Pharmacognosy Department, Faculty of Pharmacy, Cairo University.Voucher specimen numbers were 23-9-2019 I-V.The dried seeds (100 gm) were ground by a coffee grinder (Black & Decker SmartGrind Model CBG5, The Black & Decker Corporation, USA) to a fine powder that would pass through a 0.5 mm screen.The powder of each studied plant was extracted with methanol (500 mL × 4) till exhaustion.Solvents were removed with a rotary evaporator at 50 ℃ to obtain the extracts in nearly an hour.Ten mg of the solid extracts were dissolved in 1 mL of methanol, centrifugated at 13000 g for 10 min, filtered (syringe nylon filter 0.22 µm pore diameter), and kept at −20 ℃ till further LC-MS analysis.For each sample, 3 biological replicates were extracted in parallel under the same conditions.For biological analysis, 20 mg of dried methanolic extracts were dissolved in DMSO.

FRAP
The reaction mixture consists of TPTZ solution (0.25 mL; 10 mM) in HCl (40 mM), FeCl 3 (0.25 mL; 20 mM), the samples at different concentrations, and acetate buffer (2.5 mL; 300 mM, pH 3.6).The FRAP reagent (170 μL), and the tested samples (20 μL) were mixed in a 96-well plate and incubated for 30 min in a dark place.Following incubation for 30 min, the maximum absorbance was measured at 593 nm.

DPPH
DPPH was prepared in a concentration of 0.04 g% in methanol.All samples were prepared in MeOH.Then, 20 μL of the different concentrations of the samples were added to 200 µl of 2,2-diphenyl-1-picrylhydrazyl (DPPH), in a 96-well plate.The absorbance was measured at 492 nm after incubating for 30 min in the dark at room temperature.

ABTS
A working solution of ABTS •+ radicals was performed by the reaction between ABTS (7 mM) and potassium persulfate (2.5 mM) at a 1:1 (v/v) ratio.The reaction mixture was stocked in the dark at room temperature for 15 h.This solution was diluted with ethanol until an absorbance of 0.70 was recorded at 734 nm.The different concentrations of the samples (20 μL) were added and ABTS solution (200 μL).The mixture was incubated in the dark for 30 min, and the absorbance was calculated at 734 nm.
For all the assays, Trolox was used as a standard (10-1000 μg mL −1 ), and the experiments were performed in triplicate.The following dilutions from the different samples were prepared (7.8125-1000 μg mL −1 ).All the results were expressed as IC 50

Cholinesterase inhibitory activity determination
Cholinesterase inhibitory activities (AChE, and BuChE) were performed as mentioned by the standard technique [31][32][33] .In brief, 170 μL of tris-HCl buffer (200 mM, pH 7.5) was included followed by 20 μL of tested samples (250-7.8125μg mL −1 ), then 20 μL of the enzyme solution (0.1 U mL −1 ).After incubation of 10 min at 25 ℃, 40 μL of DTNB and 20 μL of the substrate (1.11 mM) were added.Butyrylthiocholine iodide and acetylthiocholine were exploited as substrates in BuChE and AChE assays, respectively, where DTNB behaved as an indicator.All samples were prepared in methanol.The intensity of the developed color was measured at 405 nm using a microplate reader (reading A) and control without the inhibitor was measured (reading B).Blank assays were run by replacing the enzyme (20 μL) with buffer and their absorbances were documented for correction of the spontaneous lysis of the indicator or inherent color of the inhibitor.Linear regression was used for the calculation of the IC 50 (50% inhibitory concentration).% Inhibition = [1 − (corrected A (Reading A-Blank)/ corrected B (Reading B-Blank))] × 100.Donepezil was used as standard (0.48-15.625 μg mL −1 ).

UHPLC analysis and data processing
Ultra-high-performance liquid chromatograms (UHPLC) were obtained on an Agilent LC-MS system composed of an Agilent 1290 Infinity II UHPLC coupled to an Agilent 6545 ESI-Q-TOF-MS in both negative and positive modes adopted the previously described method 25,34 .
Raw data of the three biological replicates acquired from the UHPLC-QTOF-MS/MS was first converted to the mzML format by msConvert software and then processed by using the mzmine 3 software (http:// mzmine.github.io/) for peak extraction 35 .Mass ion peaks were isolated with a centroid detector threshold with the noise level set to 1.0 × 10 2 and an MS level of 1. Chromatogram builder was used with a minimum time span set to 0.1 min, and the minimum height and m/z tolerance to 1 × 10 4 (positive mode), 5 × 10 3 (negative mode), and 0.001 m/z or 5.0 ppm, respectively.Chromatogram deconvolution was performed using a baseline cut-off algorithm with minimum peak height: 1 × 10 4 (positive mode), 5 × 10 3 (negative mode), peak duration range (0-0.4 min), and baseline level: 5 × 10 2 .The separated peaks were then deisotoped using the function of isotopic peaks grouper (m/z tolerance: 0.001 m/z or 5.0 ppm, retention time tolerance: 0.2 absolute (min), maximum charge: 2, and representative isotope: most intense).The parameters for data filtering, gap-filling, and the retention time normalizer were set to m/z tolerance: 0.001 m/z or 5.0 ppm, retention time tolerance: 0.2 absolute (min), and minimum standard intensity: 1 × 10 4 (positive mode), 5 × 10 3 (negative mode).The peak lists were all aligned using the join aligner with m/z tolerance: 0.001 m/z or 5.0 ppm, weight for m/z: 20, retention time tolerance: 5.0 relative (%), weight for RT: 20.An adduct search was performed for Na + , K + , NH +4 , formate, and ACN + (RT tolerance: 0.2 absolute (min), m/z tolerance: 0.001 m/z or 5.0 ppm, max relative adduct peak height: 50%).The processed data

Statistical analysis
The obtained CSV file, including the normalized peak areas and identities (retention time (tr), mass-to-charge (m/z), molecular formula, and name) of the 130 metabolites identified in the three replicates of the five samples were exported to the online platform MetaboAnalyst 5 (https:// www.metab oanal yst.ca/ Metab oAnal yst/ Modul eView.xhtml) for further chemometric analysis, including Hierarchical clustering heatmap (HCA), Volcano plot, principal component analysis (PCA), and partial least squares discriminant analysis (PLS-DA).Moreover, the variable importance in the projection (VIP) values was calculated to identify the metabolites that significantly differentiated between the five legume samples.For statistical significance, one-way ANOVA and Tukey's HSD post-hoc analysis with p-value threshold less than 0.05 were selected.Furthermore, the biological significance of the identified metabolites was declared through the metabolite set enrichment analysis using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis.Representation of the metabolic pathways is displayed according to their significance arranged by p-values (y-axis, pathway enrichment analysis) or pathway impact (x-axis, pathway topology analysis).The dot color is based on the p-value and its size is defined by the pathway impact values calculated from the matched metabolites.The biomarker compounds correlated to the anti-cholinesterase activity were determined using Pearson's correlations.
For biological studies, data are presented as mean ± standard deviation (SD) and the comparison is made by the one-way ANOVA (variance analysis) with post-hoc Tukey HSD test for multiple comparisons at a significance level of 0.05.

Two molecular docking studies
The identified compounds were docked against both AChE and BuChE target enzymes and compared to their co-crystals as well.This was performed to clarify the anti-Alzheimer effects of the investigated compounds at the molecular levels using in silico drug design tools 36,37 .The chemical structures of the tested metabolites were created in ChemDraw and transferred to the working window for energy minimization and correction 38 .The target receptors (AChE and BuChE) were obtained from the Protein Data Bank (https:// www.rcsb.org/ struc ture/ 4EY7 and https:// www.rcsb.org/ struc ture/ 8CGO), respectively.Each target was prepared for the docking step through the correction of errors, 3D hydrogenation, and energy minimization steps 39,40 .The previously prepared compounds were inserted in two different databases with the co-crystal of each target receptor as a reference 41 .The superior candidates were selected according to their binding modes, binding scores, and RMSD (Root Mean Square Deviation) 42 .Besides, two validation processes were carried out by redocking the co-crystal of each target receptor within its binding site which showed similar binding modes to the native co-crystals and low RMSD values (< 2 Å) as well 40 .

In-vitro biological activities
Antioxidant and anti-Alzheimer activities of the tested samples are summarized in Table 1.Generally, the studied legume methanolic extracts showed good antioxidant and anti-Alzheimer activities with chickpea seeds exerting almost the best results as revealed by their lower IC 50 values which were close to standard (Table 1).In the antioxidant assays (ABTS, FRAP, and DPPH), all the samples showed better results than the standard (Trolox).Whereas, in AChE assay, CP showed the highest activity among other tested samples as compared to the reference drug (Donepezil), followed by KB sample (Table 1).In BuChE assay, the studied legume methanolic extracts showed strong activity 43 as compared to the reference drug, with CP and KB exhibiting the best results.
Legumes are a source of the biologically active compounds classified as phenolic acids, flavanols, flavan-3-ols, tocopherols, anthocyanins/anthocyanidins, vitamin C, and condensed tannins/proanthocyanidins, which are responsible for their antioxidant activity 44,45  www.nature.com/scientificreports/ of BP using tests such as DPPH, ABTS, and FRAP, and the results were significantly lower than the reference drugs (quercetin, butylated hydroxyanisole, and Trolox) 46 .It is worth highlighting that the studied legume methanolic extracts were investigated for the first time as ChE inhibitors.

UHPLC/MS metabolite profiling
Legumes are a habitual part of the diet in several countries worldwide, especially as a source of phytochemicals and nutrients.The methanolic extracts of five commonly used seeds in Fabaceae family were analyzed using UHPLC-Q-TOF-MS in both negative and positive ionization modes to obtain their comprehensive profiles and correlate their metabolites with anticholinergic potential.The representative base peak chromatograms of the samples are demonstrated in Supplementary Materials Fig. S1.The identification was based on several databases the MassBank of North America (https:// mona.fiehn lab.ucdav is.edu/ https:// mona.fiehn lab.ucdav is.edu/), LOTUS: Natural Products Online Database (https:// lotus.natur alpro ducts.net/), Human Metabolome Database (http:// www.hmdb.ca/), KEGG (https:// www.genome.jp/ kegg/ kegg1.html), LipidMaps (https:// www.lipid maps.org/), and MassBank of North America (https:// mona.fiehn lab.ucdav is.edu/).In addition, a literature review of the reported chemical constituents in Fabaceae family 22,[47][48][49][50][51] .In total, 139 compounds were tentatively identified using MS 2 level of confidence 52,53 and characterized by their retention time (tr), accurate molecular monoisotopic mass, and MS/MS spectra in the five extracts.The details on their accurate masses, molecular formulas, retention time, tentative identification, and chemical class are shown in Table S1, Fig. S1.These compounds can be classified into several chemical classes, including, amino acids and amine derivatives, carbohydrates, phenolic acids, alkaloids, anthocyanins, saponins, fatty acids and fatty acyls, flavonoids, isoflavonoids, lignans, iridoids, stilbenes, sterols and terpenes, benzophenones, and phospholipids.To the best of our knowledge, this is the first comparative metabolite profiling of the five seeds coupled with chemometrics and computational analysis, which is traced to provide chemical-based evidence for their differential biological effect on Alzheimer's key enzymes.

Amino acids and sugars
Eight amino acids were identified including choline, tyramine, isovaline, pipecolic acid, Glu-Tyr, isoleucine, Glu-Leu, and Glu-Phe, additionally to one amine derivative, thermospermine.All were detected in positive ionization mode.Amino acids have a neuroprotective role by improving cognitive function and memory performance 54 , ameliorating the injury-induced cognitive impairment to prevent the progression of Alzheimer 55 .Five sugars were identified in the negative mode i.e., stachyose, galactosyl ciceritol, raffinose, sucrose, and dehydro-ciceritol (Table S1).Sugars showed a neuroprotective effect through the elevation of glutathione level 56 in previous works.

Lignans and irridoids
To date, all the available data suggest that the iridoids are a family of natural lipophile chemicals with the features of endogenous neurotrophic factors, which could be considered promising leads for the treatment of neurological disorders 64 , mainly through their anti-inflammatory effects 65 .Furthermore, lignans are among the compounds revealing nitric oxide (NO)-inhibiting activity, where NO is one of the most studied promoters of neuroinflammation 66 .Also, lignans showed inhibitory effects on beta-amyloid (Aβ 1-42 ) aggregation 67 .
Three lignans syringaresinol, secoisolariciresinol diglucoside, and its isomer were detected and showed main fragments at m/z 179 and 219 which are fragment ions formed through benzylic cleavage and containing one aromatic ring and one hydroxy group (m/z 179) or one hydroxy and one methoxy group (m/z 219) 69 .
A set of peaks belonging to group B of soyasaponins, namely soyasaponins Ba, Bb, Bd, and Be, showed mass fragments at m/z 797, 617, 599, and 581 due to successive losses of a hexose moiety along with the neutral loss of one and two molecules of water, respectively.The product ion at m/z 459 is generated from the cleavage of the www.nature.com/scientificreports/glycosidic bond and formation of soyasapogenol B aglycone; the ensuing product ions at m/z 441, 423, and 405 are most likely due to one, two, and three water loss, respectively 71 .Four sterols dihydro-dihydroxy megastigmadien-one-[apiosyl-glucoside], coumestrol acetate, β-sitosterol-3-O-glucoside, and stigmastenone were also detected as non-polar metabolites in addition to gibberellin A8 (diterpenes), characteristic to family Fabaceae.

Alkaloids, anthocyanins, and stilbene
Alkaloids are a broad class of naturally occurring compounds that have the potential to impact the central nervous system, regulating several body processes and behavior.They are also regarded as cholinesterase inhibitors that enhance memory functions 70 .Two glycoalkaloids were identified in positive ionization mode (Table S1).
Fragmentation of β-chaconine (solanidine-glucose-rhamnose) showed fragment ions at m/z 559 and 397 due to the loss of rhamnose and glucose moieties, respectively 72 .Fragments at m/z 204 and 150 are also diagnostic of fragment formulas C 14 H 22 N and C 10 H 16 N, respectively 73 .N, N′-Diferuloyl-putrescine has antioxidative and skin-whitening activities 74 .It was detected in our profile with main fragments at m/z 177, 145, and 117 75 .
Anthocyanins are natural pigments commonly detected in some fruits and vegetables.The most common anthocyanidins in foods are cyanidin, peonidin, pelargonidin, petunidin, malvidin, and delphinidin.Anthocyanins have been documented in several clinical studies as being effective in preventing different disorders such as cardiovascular and neurological disorders 76 .Three anthocyanins were detected in positive ionization mode namely, pelargonidin-3-O-glucoside, prodelphinidin B3, and procyanidin dimer B7.Prodelphinidin dimer (prodelphinidin B3) consisting of (epi) gallocatechin-(epi)catechin (m/z 593.1305) was tentatively identified.The MS 2 spectrum of this sequence produced ions at m/z 467, 425, 407, and 289, for heterolytic ring fission of the C ring with a characteristic loss of 126 Da, retro-Diels-Alder fragmentation with a neutral loss of 152 Da, followed by the loss of a water molecule unit (−18 Da), and quinone methide-fission of the inter-flavan bond producing a distinctive loss of 288 Da, respectively 77 .Procyanidin dimer (procyanidins B1, B2, or C1) (m/z 577.1342) was detected and confirmed by MS 2 at m/z 451, 425, 407 and 289 77 .The m/z 451 was attributed to heterolytic ring fission of the C ring with a characteristic loss of 126 Da.The ion at m/z 425 was due to retro-Diels-Alder fragmentation with a neutral loss of 152 Da, then the loss of a water molecule unit (−18 Da) at m/z 407 [M-H − 152 − 18] − .The ion at m/z 289 was due to quinone methide-fission of the interflavan bond producing a distinctive loss of 288 Da.Additionally, pelargonidin-3-O-glucoside was identified with the main fragment at m/z 271, which corresponded to the pelargonidin aglycon, due to the loss of glucose moiety 78 .

Fatty acids and lipids
Polyunsaturated fatty acids can alleviate the cognitive deficits of Alzheimer's by limiting amyloid polymerization in neuronal cells 79 .The identified fatty acids (Table S1) were malyngic acid, 10-oxo-nonadecanoic acid, hydroxytetracosanoic acid, 16-hydroxypalmitic acid, linolenic acid, linoleic acid, octadecadien-1-ol, and β-acetoxyoleanen-oic acid.In addition to, two fatty acyl compounds (tuberonic acid glucoside I, and tuberonic acid glucoside II) and one fatty acid derivative (coumaroyl-caffeoyl-palmitic acid derivative).Furthermore, the LC-MS analysis revealed the presence of 64 lipids from various classes (Table S1).Each is distinguished by a distinct polar moiety ("head group") and differs on the inside by the numerous contributing fatty acids with varying degrees of unsaturation and chain length, resulting in distinct fragmentation patterns for each lipid type.In the MS/MS spectra, the presence of carboxylate ions [RCOO] − identifies the particular fatty acids esterified on the glycerol skeleton, e.g., m/z 281 for octadecenoic acid (18:1), 279 for octadecadienoic (18:2), 277 for octadecatrienoic (18:3), and 255 for hexadecanoic (16:0) acids, where the esterification position can be determined through the relative intensities of both fatty acid ions (the higher abundant peak assigned for the ion at sn-2) 80 .Phospholipids could improve cognitive performance, and exhibit a neuroprotective effect against Alzheimer's by inhibiting amyloid beta deposition in neural cells 81 .
Our analysis revealed the presence of various chemical classes in the analyzed extracts, confirming the enrichment of these seeds in multifunctional nutraceuticals against Alzheimer's disease.In consequence, the multivariate data analyses were applied based on the tentatively identified metabolites and their corresponding peak areas to get current insights into the chemical heterogeneity between the Fabaceae seeds and to correlate the key bioactive compounds to the anti-cholinesterase activity of the extracts.

Metabolite profiles comparison and differentiating metabolites analysis
Multivariate statistics were used to further assess the difference in the metabolic profiles among the five legume samples.The unsupervised HCA analysis was performed using the mass data of the identified compounds (Table S1) to classify the five legumes based on the relative differences in the accumulation of secondary metabolites and to identify the holistic discrepancy and similarity in their metabolic profiles in an untargeted and throughput manner.The HCA results shown in Fig. 1 demonstrated that the five legumes were clearly divided into two clusters: KB and CP samples shared comparable metabolic profiles, forming cluster I, whereas FB, RL, and BP samples had distinct chemical composition and/or component levels, constituting cluster II.It is worth noting that within cluster II, FB and RL samples showed higher chemical similarity when compared to BP sample.
After that, a volcano plot (Fig. 2) was used to obtain further insight into the metabolic differences between these two clusters and to determine the differently accumulated compounds and their expression levels.A foldchange (FC) score ≥ 2 or ≤ 0.5 among the identified metabolites with p-value < 0.05 was used as an identification criterion.There were 46 differential accumulated metabolites (31 upregulated and 15 downregulated) between cluster I (KB and CP) and cluster II (FB, RL, and BP) samples.Consequentially, the discriminating metabolites colored red were substantially higher in cluster I while being lower in cluster II (FC ≥ 2.0), whereas those colored blue were significantly lower in cluster I but higher in cluster II (FC ≤ 0.5) (Fig. 2).These metabolites mainly comprised saponins, flavonoids, isoflavonoids, alkaloids, phenolic acid derivatives, phospholipids, and procyanidins, as well as sugars, fatty acyls and acids.
Further, PCA modeling was performed to provide a general visual separation of all of the samples.At 95% confidence limit, two principal components described the positions of the distinct metabolome clusters with PC1/ PC2 accounting for 79.3% of the variance in metabolic profiles of the analyzed extracts.In the PCA scores plot (Fig. 3A), the 3 biological replicates of each sample were coincidently grouped together affirming the extraction method consistency, as well as the data analysis stability and repeatability.It was noticeable that the metabolic profiles of CP and KB extracts are grouped on the far-right side of the plot (positive PC1 values) and are fairly distinctive and separated from the other three samples (FB, RL, and PB), located at the left side of the plot (negative PC1 values).The results were similar to the HCA model, with the five samples grouped into two distinct areas in the plot.Examination of the loading plot (Fig. 3B Unlike PCA, PLS-DA is a supervised multivariate analysis method that can maximize the differences between different groups by using partial least squares regression to model the relationship between metabolite expression and sample class to achieve modeling prediction of the studied samples.Therefore, five-class PLS-DA model (Fig. 4) was used to identify the metabolites that were responsible for the observed separation in PCA.High predictability (Q 2 ) and strong goodness of fit (R 2 X, R 2 Y) of the PLS-DA model were observed (Q 2 = 0.91, R 2 X = 0.73, R 2 Y = 0.93).The fivefold CV-ANOVA and permutation of the cross-validation test (20 iterations) revealed great predictability and goodness of fit of the constructed PLS-DA model (Fig. S3).As can be observed in the PLS-DA scores plot (Fig. 4A), the five legumes were clearly separated from each other and the variable importance in the projection (VIP) value of the first principal component of the PLS-DA model was used at p < 0.05 to find the unique chemical markers for each sample (Fig. 4B) metabolites.Accordingly, the useful markers of BP sample are phaseoside IV, β-chaconine, soyasaponin I, thermospermine, and solanidine.Likewise, CP extract was enriched in genistein 7-O-apiofuranosyl-(1 → 6)-glucoside, sissotrin, and genistin.In contrast, FB sample was characterized by a high abundance of phaseoside I, hydroxy ferutinin, and isoleucine.On the other

Metabolic pathway enrichment analysis based on KEGG database
The Kyoto Encyclopedia of Genes and Genomes (KEGG) database was used to link the identified metabolites (Table S1) to their metabolic pathways.Metabolite set enrichment analysis (MSEA) was utilized to classify the chemical groups of all identified compounds and highlight the most enriched metabolic pathway in the analyzed legume samples (Fig. 5).The enrichment bubble diagram (Fig. 5A) represents the chemical classifications of the enriched metabolite sets (top 25).As can be observed, among the top 25 chemical classes, the chemical groups with a higher enrichment ratio were cholines, oligosaccharides, fatty acid conjugates, flavonoids, glycerophosphoglycerols, prenol lipids, benzamides, octadecanoids, isoprenoids, and fatty acyl glycosides.The metabolic pathways of the metabolites were also analyzed according to the KEGG database, which reflected the most significant biochemical metabolic pathways involved by the identified metabolites in the five legumes.The identified metabolites covered a total of 29 pathways or metabolisms (Fig. S4, Table S2) and the top 15 enriched pathway terms are shown in the KEGG enrichment bar chart by calculating the -log(P-value) of each pathway, including glycerophospholipid metabolism, glycine, serine and threonine metabolism, unsaturated fatty acids biosynthesis, linoleic acid metabolism, ether lipid metabolism, valine, leucine and isoleucine biosynthesis, flavonoid biosynthesis, anthocyanin biosynthesis, and galactose metabolism (Fig. 5B).However, most metabolic reactions involved multiple metabolites, and the variation of these metabolites amounts among the five legume samples was inconsistent.Therefore, it cannot be simply said that the expression of some metabolic pathways was increased or decreased in a certain legume species.Previous studies reported that the major metabolic pathways in legumes included flavone and flavonol biosynthesis, aminoacyl-tRNA biosynthesis, isoquinoline alkaloid biosynthesis, the biosynthesis of amino acids, and isoflavonoid biosynthesis 82,83 which is in accordance with our results (Table S2).Interestingly, many of the enriched metabolic pathways in the five legumes were associated with the biosynthesis of plant secondary metabolites, such as sterols, saponins, alkaloids, isoprenoids, and flavonoids.Secondary metabolites in plants are non-essential small molecular organic molecules generated by secondary metabolism and often have bioactivity.As a result, metabolic pathway analysis is beneficial for investigating complicated biological processes that occur throughout the metabolite accumulation process in plants 84 .Indeed, the characterized chemical classes in the analyzed legumes such as flavonoids, cinnamic acids, benzoic acids, alkaloids, and sterols have been proven to exhibit strong antioxidant and anti-cholinesterase activities [85][86][87] .In this context, potential metabolites identified in metabolic pathways might serve as therapeutic targets and contribute to the development of broad-spectrum drugs.It can aid in the development of testable predictions, the understanding of drug action mechanisms, and the increase of research productivity towards novel drug discovery 88 .

Figure 1 .
Figure 1.Hierarchical cluster analysis (HCA) dendrogram of the five analyzed legumes based on cluster analysis of mass spectrometric biochemical profiles.

Figure 2 .
Figure 2. Volcano plot of differential metabolites of cluster I (KB, CP) vs. cluster II (FB, RL, BP).The foldchange value (FC) for each differential metabolite was transformed as Log 2 , and the corresponding p-value was transformed as − Log 10 .

Figure 3 .
Figure 3. Principal component analysis (PCA) scores plot (A) and loadings plot visualization (B) using the identified metabolites by LC/MS analysis of the five legume samples (n = 3).

Figure 4 .
Figure 4. Partial least squares discriminant analysis (PLS-DA) scores plot (A) and the variable importance in projection (VIP) score showing the top 25 differential metabolites (VIP scores > 1) (B) in the methanolic extracts of the five analyzed legumes.

Figure 5 .Table 2 .
Figure 5. (A) Classification of the identified metabolites in the five analyzed legumes.The color of the dots represents the transformed P-value of the hypergeometric test, and the size represents the number of differential metabolites, and the larger the size, the greater the number of differential metabolites within the chemical class.(B) Top 15 different enriched KEGG pathways, the horizontal coordinate indicates the ratio of the differential metabolite numbers in the corresponding pathway to the total identified metabolite numbers in this pathway, and the larger the ratio value, the greater the enrichment of this pathway, and the vertical coordinate indicates the name of the pathway.The color intensity reflects the statistical significance of the identified pathways, the darker the color, the more affected the pathway.